AITopics

2309.16977

Country: Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report (1.00)

Industry: Education (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Grier, Daniel, Pashayan, Hakop, Schaeffer, Luke

Sample-optimal classical shadows for pure states

arXiv.org Artificial IntelligenceNov-21-2022

We consider the classical shadows task for pure states in the setting of both joint and independent measurements. The task is to measure few copies of an unknown pure state $\rho$ in order to learn a classical description which suffices to later estimate expectation values of observables. Specifically, the goal is to approximate $\mathrm{Tr}(O \rho)$ for any Hermitian observable $O$ to within additive error $\epsilon$ provided $\mathrm{Tr}(O^2)\leq B$ and $\lVert O \rVert = 1$. Our main result applies to the joint measurement setting, where we show $\tilde{\Theta}(\sqrt{B}\epsilon^{-1} + \epsilon^{-2})$ samples of $\rho$ are necessary and sufficient to succeed with high probability. The upper bound is a quadratic improvement on the previous best sample complexity known for this problem. For the lower bound, we see that the bottleneck is not how fast we can learn the state but rather how much any classical description of $\rho$ can be compressed for observable estimation. In the independent measurement setting, we show that $\mathcal O(\sqrt{Bd} \epsilon^{-1} + \epsilon^{-2})$ samples suffice. Notably, this implies that the random Clifford measurements algorithm of Huang, Kueng, and Preskill, which is sample-optimal for mixed states, is not optimal for pure states. Interestingly, our result also uses the same random Clifford measurements but employs a different estimator.

artificial intelligence, complexity, machine learning, (17 more...)

2211.1181

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
(2 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.93)

Bossens, David M., Bishop, Nicholas

Explicit Explore, Exploit, or Escape ($E^4$): near-optimal safety-constrained reinforcement learning in polynomial time

arXiv.org Artificial IntelligenceNov-14-2021

In reinforcement learning (RL), an agent must explore an initially unknown environment in order to learn a desired behaviour. When RL agents are deployed in real world environments, safety is of primary concern. Constrained Markov decision processes (CMDPs) can provide long-term safety constraints; however, the agent may violate the constraints in an effort to explore its environment. This paper proposes a model-based RL algorithm called Explicit Explore, Exploit, or Escape ($E^{4}$), which extends the Explicit Explore or Exploit ($E^{3}$) algorithm to a robust CMDP setting. $E^4$ explicitly separates exploitation, exploration, and escape CMDPs, allowing targeted policies for policy improvement across known states, discovery of unknown states, as well as safe return to known states. $E^4$ robustly optimises these policies on the worst-case CMDP from a set of CMDP models consistent with the empirical observations of the deployment environment. Theoretical results show that $E^4$ finds a near-optimal constraint-satisfying policy in polynomial time whilst satisfying safety constraints throughout the learning process. We discuss robust-constrained offline optimisation algorithms as well as how to incorporate uncertainty in transition dynamics of unknown states based on empirical inference and prior knowledge.

cmdp, near-optimal safety-constrained reinforcement, unknown state, (13 more...)

2111.07395

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Rosenberg, Aviv, Mansour, Yishay

Stochastic Shortest Path with Adversarially Changing Costs

arXiv.org Machine LearningNov-5-2020

Stochastic shortest path (SSP) is a well-known problem in planning and control, in which an agent has to reach a goal state in minimum total expected cost. In this paper we consider adversarial SSPs that also account for adversarial changes in the costs over time, while the dynamics (i.e., transition function) remains unchanged. Formally, an agent interacts with an SSP environment for $K$ episodes, the cost function changes arbitrarily between episodes, and the fixed dynamics are unknown to the agent. We give high probability regret bounds of $\widetilde O (\sqrt{K})$ assuming all costs are strictly positive, and $\widetilde O (K^{3/4})$ for the general case. To the best of our knowledge, we are the first to consider this natural setting of adversarial SSP and obtain sub-linear regret for it.

algorithm, probability, transition function, (14 more...)

2006.11561

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nevada (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceMar-9-2020

Exploring Unknown States with Action Balance

Song, Yan, Chen, Yingfeng, Hu, Yujing, Fan, Changjie

Exploration is a key problem in reinforcement learning. Recently bonus-based methods have achieved considerable successes in environments where exploration is difficult such as Montezuma's Revenge, which assign additional bonus (e.g., intrinsic reward) to guide the agent to rarely visited states. Since the bonus is calculated according to the novelty of the next state after performing an action, we call such methods the next-state bonus methods. However, the next-state bonus methods bring extra issues. It may lead agent to be trapped in states that fewer being visited and ignore to explore unknown states. Moreover, the behavior policy of the agent is also influenced by the bonus added to the state (or state-action) values indirectly. In contrast to the bonus-based methods which explore in known states, in this paper, we focus on the other part of exploration: exploration for finding unknown states. We propose the action balance exploration method to overcome the defects of the next-state bonus methods, which balances the chosen time of each action in each state and can be treated as an extension of upper confidence bound (UCB) to deep reinforcement learning. To take both the advantages of the next-state bonus method and our action balance exploration method, we propose the action balance RND method, which takes both parts of exploration into consideration. The experiments on grid world and Atari games demonstrate action balance exploration has a better capability in finding unknown states and can improve the real performance of RND in some hard exploration environments respectively.

balance exploration, exploration, unknown state, (13 more...)

2003.04518

Country:

North America > United States > California > Los Angeles County > Long Beach (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada (0.04)
(15 more...)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Bauer, Daniel, Kuhnert, Lars, Eckstein, Lutz

Deep, spatially coherent Inverse Sensor Models with Uncertainty Incorporation using the evidential Framework

arXiv.org Artificial IntelligenceMar-29-2019

To perform high speed tasks, sensors of autonomous cars have to provide as much information in as few time steps as possible. However, radars, one of the sensor modalities autonomous cars heavily rely on, often only provide sparse, noisy detections. These have to be accumulated over time to reach a high enough confidence about the static parts of the environment. For radars, the state is typically estimated by accumulating inverse detection models (IDMs). We employ the recently proposed evidential convolutional neural networks which, in contrast to IDMs, compute dense, spatially coherent inference of the environment state. Moreover, these networks are able to incorporate sensor noise in a principled way which we further extend to also incorporate model uncertainty. We present experimental results that show This makes it possible to obtain a denser environment perception in fewer time steps.

artificial intelligence, detection, machine learning, (16 more...)

1904.00842

Genre: Research Report (0.40)

Industry:

Information Technology (0.86)
Automobiles & Trucks (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Rahimian, M. Amin, Jadbabaie, Ali

Learning without recall in directed circles and rooted trees

arXiv.org Machine LearningNov-27-2016

This work investigates the case of a network of agents that attempt to learn some unknown state of the world amongst the finitely many possibilities. At each time step, agents all receive random, independently distributed private signals whose distributions are dependent on the unknown state of the world. However, it may be the case that some or any of the agents cannot distinguish between two or more of the possible states based only on their private observations, as when several states result in the same distribution of the private signals. In our model, the agents form some initial belief (probability distribution) about the unknown state and then refine their beliefs in accordance with their private observations, as well as the beliefs of their neighbors. An agent learns the unknown state when her belief converges to a point mass that is concentrated at the true state. A rational agent would use the Bayes' rule to incorporate her neighbors' beliefs and own private signals over time. While such repeated applications of the Bayes' rule in networks can become computationally intractable, in this paper, we show that in the canonical cases of directed star, circle or path networks and their combinations, one can derive a class of memoryless update rules that replicate that of a single Bayesian agent but replace the self beliefs with the beliefs of the neighbors. This way, one can realize an exponentially fast rate of learning similar to the case of Bayesian (fully rational) agents. The proposed rules are a special case of the Learning without Recall.

artificial intelligence, bayesian inference, machine learning, (17 more...)

doi: 10.1109/ACC.2015.7171992

1611.08791

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Shahrampour, Shahin, Rahimian, Mohammad Amin, Jadbabaie, Ali

Switching to Learn

arXiv.org Machine LearningMar-11-2015

Distributed estimation, detection, and learning theory in networks have attracted much attention over the past decades [1], [2], [3], [4], with applications that range from sensor and robotic networks [5], [6], [7], [8], [9] to social and economic networks [10], [11], [12]. In these scenarios, agents in a network need to learn the value of a parameter that they may not be able to infer on their own, but the global spread of information in the network provides them with adequate data to learn the truth collectively. As a result, agents iteratively exchange information with their neighbors. For instance, in distributed sensor and robotic networks, agents use local diffusion to augment their imperfect observations with information from their neighbors and achieve consensus and coordination [13], [14]. Similarly, agents exchange beliefs in social networks to benefit from each other's observations and private information and learn the unknown state of the world [15], [16]. Existing literature on distributed learning focuses mostly on environments where individuals communicate at every round. Of particular relevance to our discussion are a host of algorithms that follow the non-Bayesian learning scheme in Jadbabaie et.

artificial intelligence, bayesian inference, machine learning, (20 more...)

1503.03517

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Networks (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.50)

arXiv.org Machine LearningDec-12-2012

Reinforcement Learning with Partially Known World Dynamics

Shelton, Christian R.

Reinforcement learning would enjoy better success on real-world problems if domain knowledge could be imparted to the algorithm by the modelers. Most problems have both hidden state and unknown dynamics. Partially observable Markov decision processes (POMDPs) allow for the modeling of both. Unfortunately, they do not provide a natural framework in which to specify knowledge about the domain dynamics. The designer must either admit to knowing nothing about the dynamics or completely specify the dynamics (thereby turning it into a planning problem). We propose a new framework called a partially known Markov decision process (PKMDP) which allows the designer to specify known dynamics while still leaving portions of the environment s dynamics unknown.The model represents NOT ONLY the environment dynamics but also the agents knowledge of the dynamics. We present a reinforcement learning algorithm for this model based on importance sampling. The algorithm incorporates planning based on the known dynamics and learning about the unknown dynamics. Our results clearly demonstrate the ability to add domain knowledge and the resulting benefits for learning.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

1301.0601

Country: North America > United States (0.46)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Neural Information Processing SystemsDec-31-1994

Probabilistic Anomaly Detection in Dynamic Systems

Smyth, Padhraic

This paper describes probabilistic methods for novelty detection when using pattern recognition methods for fault monitoring of dynamic systems. The problem of novelty detection is particularly acute when prior knowledge and training data only allow one to construct an incomplete classification model. Allowance must be made in model design so that the classifier will be robust to data generated by classes not included in the training phase. For diagnosis applications one practical approach is to construct both an input density model and a discriminative class model. Using Bayes' rule and prior estimates of the relative likelihood of data of known and unknown origin the resulting classification equations are straightforward.

detection, probabilistic anomaly detection, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Mateo County > San Mateo (0.05)
North America > United States > California > Los Angeles County > Pasadena (0.05)
Oceania > Australia (0.04)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.32)